Skip to content

Conversation

@ybastide
Copy link
Collaborator

Summary

This pull request refactors and optimizes several components related to robots parsing, aiming to improve maintainability and performance.

Changes

  • Made MyExtensions class internal for encapsulation.
  • Re-added the fast path logic with fixes to ensure correct functionality.
  • Fixed an issue with non-wildcard pattern matching in robots handling.
  • Merged CheckAllow and CheckDisallow to streamline logic.
  • Optimized the longest match strategy for improved performance.
  • Removed System.Memory dependency to reduce unnecessary external dependencies.
  • Applied minor optimizations within RobotsMachine for better overall performance.

Checklist

  • Tests added or updated as needed
  • Documentation updated where applicable
  • Changes have been reviewed and tested

@ybastide ybastide requested a review from Copilot January 25, 2026 15:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors and optimizes the robots.txt parsing module by consolidating logic, improving performance characteristics, and reducing external dependencies.

Changes:

  • Merged CheckAllow and CheckDisallow methods into a single Check method to eliminate code duplication
  • Replaced MatchAllowFast and MatchDisallowFast with a unified MatchFast method that returns match length instead of boolean
  • Optimized pattern matching by introducing a fast path for non-wildcard patterns and moving index.htm normalization to rule creation time

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
RobotsTxt/Extensions.cs Made MyExtensions class internal and added early-exit optimizations for case-insensitive comparisons
RobotsTxt/LongestMatchRobotsMatchStrategy.cs Consolidated matching methods into a single MatchFast that returns match length; replaced boolean-based logic with length-based matching
RobotsTxt/RobotsMachine.cs Merged CheckAllow/CheckDisallow into unified Check method; moved index.htm normalization to rule creation; inlined user agent comparison; changed to length-based pattern matching
RobotsTxt/RobotsTxt.csproj Removed System.Memory package dependency
TestRobotsTxt/TestLongestMatchRobotsMatchStrategy.cs Updated tests to verify match length instead of boolean result

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ybastide ybastide marked this pull request as ready for review January 25, 2026 15:35
@ybastide ybastide requested a review from Copilot January 25, 2026 15:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

[InlineData("/", "/", true, 1)]
[InlineData("/", "/$", true, 2)]
[InlineData("a", "b", false, -1)]
[InlineData("/foo/bar", "/bar", false, -1)]
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case verifies that substring matching doesn't occur when it shouldn't (e.g., '/bar' should not match '/foo/bar'). However, there's no corresponding test for the opposite scenario where a pattern should match when it appears at the start of the path. Consider adding a test case like [InlineData("/bar/foo", "/bar", true, 4)] to ensure the fast path correctly matches path prefixes.

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +103
if (userAgent.Length != ua.Length) continue;
bool match = true;
for (int i = 0; i < ua.Length; i++)
{
byte a = userAgent[i];
byte b = ua[i];
if (a == b || (a >= 'A' && a <= 'Z' && a + 32 == b) || (b >= 'A' && b <= 'Z' && b + 32 == a))
continue;
match = false;
break;
}
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manual case-insensitive comparison duplicates the logic already implemented in MyExtensions.EqualsIgnoreCase(). Consider using the existing extension method instead to reduce code duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
var haveWildcards = value.Length >= 1 && (value.Contains((byte)'*') || value[^1] == '$');
var state = new AllowState(value.ToArray(), haveWildcards);

var fastPath = !value.ContainsAny("*$"u8);
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name 'fastPath' is ambiguous in this context. It's not immediately clear that this flag indicates whether the pattern lacks wildcards. Consider renaming to 'hasNoWildcards' or 'isSimplePattern' to better convey its meaning.

Copilot uses AI. Check for mistakes.
Comment on lines 157 to 158
var fastPath = !value.ContainsAny("*$"u8);
var state = new DisallowState(value.ToArray(), fastPath);
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name 'fastPath' is ambiguous in this context. It's not immediately clear that this flag indicates whether the pattern lacks wildcards. Consider renaming to 'hasNoWildcards' or 'isSimplePattern' to better convey its meaning.

Suggested change
var fastPath = !value.ContainsAny("*$"u8);
var state = new DisallowState(value.ToArray(), fastPath);
var hasNoWildcards = !value.ContainsAny("*$"u8);
var state = new DisallowState(value.ToArray(), hasNoWildcards);

Copilot uses AI. Check for mistakes.
Comment on lines +85 to 87
private static bool Matches(ReadOnlySpan<byte> path, ReadOnlySpan<byte> pattern)
{
Span<int> pos = stackalloc int[path.Length + 1];
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Matches method is now private but appears to be incomplete in the diff. The method allocates a stack buffer but the implementation details are cut off. Ensure that this method body is complete and correctly implements the wildcard matching logic that was previously in MatchesFast.

Copilot uses AI. Check for mistakes.
@ybastide ybastide requested a review from Copilot January 25, 2026 16:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

[InlineData("/", "/", true, 1)]
[InlineData("/", "/$", true, 2)]
[InlineData("a", "b", false, -1)]
[InlineData("/foo/bar", "/bar", false, -1)]
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test case validates that patterns must match from the start of the path. Consider adding a corresponding test in the 'Slow' path to ensure both implementations handle this scenario consistently.

Copilot uses AI. Check for mistakes.
var haveWildcards = value.Length >= 1 && (value.Contains((byte)'*') || value[^1] == '$');
var state = new AllowState(value.ToArray(), haveWildcards);

var fastPath = !value.ContainsAny("*$"u8);
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name 'fastPath' is ambiguous. It should be renamed to 'isSimplePattern' to match the naming used in the AllowState constructor and throughout the codebase.

Copilot uses AI. Check for mistakes.
if (isSimplePattern)
{
return path.IndexOf(pattern) != -1;
return path.StartsWith(pattern) ? pattern.Length : -1;
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simple patterns (no wildcards), the original logic checked if the pattern appeared anywhere in the path using IndexOf. The new logic only checks if the path starts with the pattern. This changes the behavior for non-wildcard patterns and may break existing functionality where patterns like '/bar' should match paths like '/foo/bar'.

Suggested change
return path.StartsWith(pattern) ? pattern.Length : -1;
return path.IndexOf(pattern) >= 0 ? pattern.Length : -1;

Copilot uses AI. Check for mistakes.
@ybastide ybastide force-pushed the refactor/speed-improvements branch from 5029d22 to 813ddf4 Compare January 25, 2026 16:13
@ybastide ybastide merged commit f20c09b into main Jan 25, 2026
1 check passed
@ybastide ybastide deleted the refactor/speed-improvements branch January 25, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants